home *** CD-ROM | disk | FTP | other *** search
- These notes will help people wanting to use ISO 8859-1 stuff in
- WAIS. We have only done this on Sun's running SunOS 4.1.3. To
- run it successfully under SunOS 4.1.1, one will need patches
- from Sun to correct problems with COLLDEF(8).
-
- 1- Install the collation table.
-
- You might already have one, check /etc/locale/LC_COLLATE to see
- what's in it. You probably have only C and default in there.
- Due to a bug in the SunOS 4.1.3, it is useless to have the thing
- in /usr/share/lib/locale/LC_COLLATE since it's ignored. (This
- caused us much grief.)
-
- We decided to use external definitions of the 8859-1 table
- which we also put in /etc/locale/LC_COLLATE; we called this
- file iso_charmap but you can use anything as long as the same
- name appears in the input to COLLDEF(8).
-
- As root,
- /usr/etc/colldef <your_input_file /etc/locale/LC_COLLATE/iso_8859_1
- will install the table. Our copies of these files can be found at
- the end of this document. By the way, my version of colldef gives
- a checksum of 02951 24.
-
- One is not obliged to use the name iso_8859_1, but (for us at least)
- it tends to keep it simple since this is exactly what we are doing!
- If you do use something else, the lines containing
- setlocale(LC_CTYPE,"iso_8859_1");
- setlocale(LC_COLLATE,"iso_8859_1");
- will have to be changed to whatever is needed.
-
-
- 2- Recompile and reinstall WAIS.
-
- This part is easy. Just add -DLOCALE to the CFLAGS line in the
- primary Makefile and make. What this does is replace every call
- to STRCMP with a call to STRCOLL instead. It also adds calls to
- setlocale in the main programs.
-
- The only other point worth mentionning: even if we stuck in
- #include <locale.h>
- setlocale(LC_CTYPE,"iso_8859_1");
- setlocale(LC_COLLATE,"iso_8859_1");
- in the source, THIS DOES NOT SUFFICE! One must also have
- LC_CTYPE=iso_8859_1
- LC_COLLATE=iso_8859_1
- in the run-time environment. This also caused us much grief.
- This problem might be specific to Suns under 4.1.3
-
- Note that Sun has put in the right stuff for LC_CTYPE which
- they have not done for LC_COLLATE.
-
- One might also want to add to the stoplist. To more or less
- quote from the docs, when one generates an index, the
- -stop stop_list_file_name
- option permits the use of another list.
- We've called our (French ISO 8859) stoplist stop.uqam and
- put it in here.
-
- You might want to add the setlocale stuff to xwais if you're
- using it. In ~/x/xwais.c and ~/x/xwaisq.c, add
- #include <locale.h>
- setlocale(LC_CTYPE,"iso_8859_1");
- setlocale(LC_COLLATE,"iso_8859_1");
-
- 3- Gopher stuff.
-
- If WAIS is called by Gopher, gopherd must be changed for all this
- to work. The same 3 lines must be added to ~/gopherd/gopherd.c, ie
- #include <locale.h>
- setlocale(LC_CTYPE,"iso_8859_1");
- setlocale(LC_COLLATE,"iso_8859_1");
- and the run-time environment must contain
- LC_CTYPE=iso_8859_1
- LC_COLLATE=iso_8859_1
- for the same reasons as in WAIS.
-
-
- Good luck!
-
- Sylvie St-Georges st-georges.sylvie@uqam.ca
-
- Our iso_charmap file
- --------------------------------------------------
- A-grave \xc0
- A-acute \xc1
- A-circu \xc2
- A-tilde \xc3
- A-diaer \xc4
- A-ring \xc5
- AE \xc6
- C-cedil \xc7
- E-grave \xc8
- E-acute \xc9
- E-circu \xca
- E-diaer \xcb
- I-grave \xcc
- I-acute \xcd
- I-circu \xce
- I-diaer \xcf
- ETH \xd0
- N-tilde \xd1
- O-grave \xd2
- O-acute \xd3
- O-circu \xd4
- O-tilde \xd5
- O-diaer \xd6
- MULT \xd7
- O-stroke \xd8
- U-grave \xd9
- U-acute \xda
- U-circu \xdb
- U-diaer \xdc
- Y-acute \xdd
- THORN \xde
- s-sharp \xdf
- a-grave \xe0
- a-acute \xe1
- a-circu \xe2
- a-tilde \xe3
- a-diaer \xe4
- a-ring \xe5
- ae \xe6
- c-cedil \xe7
- e-grave \xe8
- e-acute \xe9
- e-circu \xea
- e-diaer \xeb
- i-grave \xec
- i-acute \xed
- i-circu \xee
- i-diaer \xef
- eth \xf0
- n-tilde \xf1
- o-grave \xf2
- o-acute \xf3
- o-circu \xf4
- o-tilde \xf5
- o-diaer \xf6
- DIVIS \xf7
- o-stroke \xf8
- u-grave \xf9
- u-acute \xfa
- u-circu \xfb
- u-diaer \xfc
- y-acute \xfd
- thorn \xfe
- y-diaer \xff
- --------------------------------------------------
-
- Our input file to colldef
- --------------------------------------------------
-
- charmap /etc/locale/LC_COLLATE/iso_charmap
-
- substitute "\xc6" with "AE"
- substitute "\xdf" with "ss"
- substitute "\xe6" with "ae"
-
- order \x00;...;\x20;\x21;\x22;\x23;\x24;\x25;\x26;\x27;\x28;\x29;\
- \x2A;\x2B;\x2C;\x2D;\x2E;\x2F;0;1;2;3;4;5;6;7;8;9;\
- \x3A;\x3B;\x3C;\x3D;\x3E;\x3F;\x40;\
- \x5B;\x5C;\x5D;\x5E;\x5F;\x60;\x7B;\x7C;\x7D;\x7E;\x7F;\
- (A,<A-grave>,<A-acute>,<A-circu>,<A-tilde>,<A-diaer>,<A-ring>,\
- a,<a-grave>,<a-acute>,<a-circu>,<a-tilde>,<a-diaer>,<a-ring>);\
- (B,b);(C,<C-cedil>,c,<c-cedil>);(D,d);\
- (E,<E-grave>,<E-acute>,<E-circu>,<E-diaer>,\
- e,<e-grave>,<e-acute>,<e-circu>,<e-diaer>);\
- (F,f);(G,g);(H,h);\
- (I,<I-grave>,<I-acute>,<I-circu>,<I-diaer>,\
- i,<i-grave>,<i-acute>,<i-circu>,<i-diaer>);\
- (J,j);(K,k);(L,l);(M,m);(N,<N-tilde>,n,<n-tilde>);\
- (O,<O-grave>,<O-acute>,<O-circu>,<O-tilde>,<O-diaer>,<O-stroke>,\
- o,<o-grave>,<o-acute>,<o-circu>,<o-tilde>,<o-diaer>,<o-stroke>);\
- (P,p);(Q,q);(R,r);(S,s);(T,t);\
- (U,<U-grave>,<U-acute>,<U-circu>,<U-diaer>,\
- u,<u-grave>,<u-acute>,<u-circu>,<u-diaer>);\
- (V,v);(W,w);(X,x);(Y,<Y-acute>,y,<y-acute>,<y-diaer>);(Z,z)
- --------------------------------------------------
-
- --------------------------------------------------------------------------
- Jean-Pierre Kuypers <Kuypers@sri.ucl.ac.BE> writes the following:
-
- About the credits, we'll not forget the works of Sylvie St-Georges
- (Bonjour, Sylvie) for a ISO 8859-1'able version of WAIS. Pascal Maes uses
- it to make his proposal. The work is available on ftp.uqam.ca:/pub/WAIS/,
- where the files alire (French) and README.iso8859 (English) explain what
- and how to do.
-
- I try to install the new jughead version, on a SunOS 4.1.1. After some
- troubles, as usual, I have now a jughead server able to search non-ASCII
- items.
-
- After uncommenting the CFLAG in the Makefile and making/installing, I must
- do a lot of things before it works.
-
- - I run the "colldef" command with the two Sylvie's files. But I must
- delete all the "-", "<", and ">" is these files to avoid "Syntax error"
- messages with colldef.
-
- - I must set the LC_COLLATE environment variable (I had already the
- LC_CTYPE) to iso_8859_1.
-
- - I must do a link /etc/locale -> /usr/share/lib/locale. So, I don't put
- the stuff in /etc/locale/LC_COLLATE/ (as writed by Sylvie), but in
- /usr/share/lib/locale/LC_COLLATE/.
-
- - I must do "jughead -tB" to rebuild the correct index table. It's mandatory!
-
- After that, I have a +/- good jughead server.
- Curiously, e' (e-acute) and e` (e-grave) are seen as t and l. So, re'seau
- and re`gles match as rtseau and rlgles. But they match correclty! To search
- "re'seau", I may give "re'seau" or "rtseau". With other letters, as a`, u`,
- e^, i^, o^, there is no problem. But E^ doesn't work.
-
-